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Differences  in  Performance  Schemata  as  a  Function  of 
Organizational  Level 

/  Abstract 

The  strategy  of  training  raters  to  adopt  the  same  evaluative  standard  has 

become  a  common  practice  in  laboratory  performance  appraisal  research. 

7-  /' 

We  felt  that  in  the  applied  setting  this  "frane-of- reference"  rater 
training  strategy  should  be  expanded  to  include  the  ratees*  standards  in 
order  to  clarify  workers*  understanding  of  organizational  expectations. 
This  study  explored  the  necessary  foundations  for  using  this  rater  and 
ratee  frame-of- reference  training  strategy.  A  modified  behavioral 
anchored  scaling  method  was  used  to  gather  data  in  two  law  enforcement 
agencies.  The  goals  of  the  study  were  to  identify  performance  schemata 
for  the  position  of  patrol  officer,  and  to  assess  how  the  schemata 
differed  by  organizational  level  (i.e.,  patrol  officers  versus  their 
supervisors).  Data  were  analyzed  using  repeated  measures  analyses  of 
variance  and  discriminant  analyses.  Differences  in  performance  schemata 
between  organizational  levels  were  tentatively  identified.  Findings  were 
discussed  in  relation  to  the  needs  of  the  two  agencies  and  in  terms  of 
general  implications  for  rater  training  strategies. 
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To  date  the  social  cognitive  approach  to  performance  appraisal  has 
not  generated  many  innovations  for  applied  practices.  Banks  and  Murphy 
(1985)  have  even  suggested  that  the  social  cognitive  approach  may  be 
"widening  the  gap"  between  the  laboratory  and  practice.  However,  the  one 
practical  notion  emerging  from  this  approach  is  the  training  of  raters  to 
adopt  the  same  evaluative  standard  to  use  as  the  comparison  for  judging 
ratee  performance.  Such  an  evaluative  standard  represents  a  performance 
schema.  Taylor  and  Crocker  (1981)  define  a  schema  as  a  cognitive 
structure  consisting  of  representations  of  some  defined  stimulus  domain 
(in  this  case,  the  Job  in  question),  k  schema  contains  general  knowledge 
about  the  domain  including  a  specification  of  the  relationships  among  its 
attributes  as  well  as  specific  examples  or  instances  of  the  stimulus 
domain. 

Under  the  rubric  of  frame-of- reference  training,  Bernardin  and 
Buckley  (1981)  were  the  first  to  propose  training  raters  to  use  the 
appropriate  evaluative  standard  and  reoent  laboratory  studies  have  shown 
the  potential  utility  of  this  strategy  (Mclntrye,  Smith,  &  Hassett,  1984; 
Pulakos,  1984).  The  proposed  advantage  of  frame-of-reference  training  is 
that  teaching  all  raters  to  use  the  same  evaluative  standard  would  result 
in  more  accurate  and  consistent  ratee  evaluations. 

In  our  opinion,  there  is  another  potential  advantage  to  frame»of> 
reference  training.  Just  as  supervisors  have  their  implicit  notions 
about  what  successful  and  unsuccessful  job  performance  entails  so  do  the 
workers  performing  the  Job.  If  supervisors  and  workers  differ 
significantly  in  terms  of  their  evaluative  schemata,  it  seems  apparent 
that  workers  would  view  performance  appraisal  as  an  unfair  and  even 
arbitrary  process  (c.f.,  Landy,  Barnes,  k  Murphy,  1978).  Therefore,  it 
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seems  that  practitioners  attempting  to  use  some  variation  of  frame-of- 
reference  training  should  not  only  train  raters  to  adopt  the  same 
evaluative  standard,  but  they  should  also  train  the  workers  to  understand 
the  standard  by  which  supervisors  are  judging  their  performance. 

The  purpose  of  this  paper  was  to  explore  the  necessary  foundations 
for  the  application  of  the  strategy  outlined  above.  This  study  was 
conducted  within  two  law  enforcement  agencies  contemplating  using  this 
rater  and  ratee  frame-of- reference  training.  Our  goals  for  the  current 
study  were  two-fold.  First,  the  generation  of  patrol  officer  performance 
schemata  by  both  supervisors  and  patrol  officers,  and  second,  the 
assessment  of  how  the  schemata  differed  by  organizational  level.  Recent 
research  with  the  military  has  explored  possible  methods  for  generating 
performance  schema  (e.g.,  Foti,  1987),  and  the  current  study  applied  such 
methods  for  the  identification  of  the  patrol  officer  performance 
schemata.  Concerning  the  issue  of  differences  by  organizational  level, 
Landy,  Farr,  Saal,  and  Freytag  (1976)  provided  general  evidence  that 
small  but  significant  differences  could  be  expected  between  performance 
expectations  of  patrol  officers  and  their  supervisors.  We  were  concerned 
with  identifying  potential  differences  in  two  respects.  First,  to  assess 
if  disagreements  in  terms  of  judging  specific  performance  incidents  were 
associated  with  particular  dimensions  of  performance.  Second,  to  assess 
if  the  level  of  performance  represented  by  the  behavioral  incidents 
moderated  the  occurrence  of  differences  in  the  judgments  between  patrol 
officers  and  their  supervisors  (i.e.,  were  differences  ir.  judgments  to 
some  degree  a  function  of  items  representing  below  or  above  average 
performance) . 
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Subjects 

Data  were  collected  from  a  municipal  police  department  and  the 
surrounding  county  sheriff's  department.  All  analyses  were  performed  on 
the  responses  to  the  final  phase  questionnaire  which  was  completed  by  82 
patrol  officers  and  supervisors.  The  breakdown  of  subjects  was:  42  city 
patrol  officers,  19  city  supervisors,  16  county  patrol  officers  and  4 
county  supervisors.  Supervisors  from  both  organizations  held  the  rank  of 
sergeant  and  lieutenant  and  evaluated  patrol  officers  on  a  regular  basis. 
Procedure 

A  modified  behaviorally  anchored  scaling  method  was  used  to 
generate  the  performance  schemata.  The  procedural  modification  was  that 
no  items  were  discarded  throughout  the  procedure.  Those  items  not 
meeting  the  allocation  criterion  were  still  used  in  the  item  scaling 
questionnaire  and  no  anchor  retention  criterion  was  used  because  the 
purpose  of  this  study  did  not  involve  the  creation  of  a  behaviorally 
anchored  rating  scale.  At  an  initial  conference  with  eight  patrol 
officers  and  supervisors  it  was  decided  that  the  nine  peer  rating  scales 
used  by  Landy,  et.  al.  (1976,  p.  752)  were  applicable  to  both  agencies. 

At  the  next  meeting,  those  eight  police  officers  and  two  members  of  the 
research  team  generated  152  behavioral  performance  incidents.  A 
subsequent  sample  of  ten  patrol  officers  and  10  supervisors  participated 
in  the  allocation  phase.  A  60)  criterion  was  used  to  allocate  items  to 
dimensions.  With  the  number  of  items  allocated  per  dimension  in 
parentheses,  the  results  of  the  allocation  phase  were:  (a)  job  knowledge 
(13),  (b)  Judgment  (9),  (c)  use  of  equipment  (7),  (d)  dealing  with  public 
(7),  (e)  reliability  (26),  (f)  demeanor  (14),  (g)  compatibility  (23),  (h) 
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communication  (16),  (i)  work  attitude  (13).  The  24  items  not  meeting  the 
allocation  criterion  were  placed  on  a  dimension  labeled  "unassigned". 

The  final  phase  was  the  standard  item  scaling  questionnaire  (i.e., 
subjects  rated  the  level  of  performance  each  item  represented  for  its 
assigned  dimension).  For  the  unassigned  dimension  subjects  were  informed 
that  the  items  did  not  fit  neatly  into  the  other  nine  dimensions,  but  the 
ratings  of  the  items  were  still  needed.  All  items  were  rated  on  a  7- 
point  scale  from  1  (unsatisfactory)  to  7  (excellent). 

Dependent  Variables 

For  all  analyses,  responses  to  the  item  scaling  questionnaire 
served  as  the  dependent  variables.  In  order  to  explore  possible 
differences  between  patrol  officers  and  supervisors  concerning 
perceptions  of  good  and  poor  performance  incidents,  good  performance 
items  were  analyzed  separately  from  poor  performance  items.  For  each 
performance  dimension,  items  with  an  overall  sample  mean  of  less  than 
four  were  considered  the  poor  performance  incidents,  and  items  with  means 
greater  than  four  were  considered  good  performance  incidents. 

Analyses 

Analyses  were  conducted  in  two  phases.  First,  2  (Level)  X  n 
(Item)  repeated  measures  analyses  of  variance  (ANOVA)  were  conducted  for 
the  good  and  poor  incidents  on  each  performance  dimension.  The  Level 
factor  represented  patrol  officers  versus  supervisors  and  Item 
represented  the  repeated  measure  factor  of  number  of  items.  To  achieve  a 
clearer  notion  of  the  strength  of  the  effects,  the  second  phase  of 
analyses  involved  a  series  of  discriminant  analyses  that  predicted 
organizational  level  from  the  responses  to  the  item  scaling 
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Repeated  Measures  ANOVAS 

Other  than  the  main  effects  for  the  item  factor  (which  simply  meant 
some  items  were  different  than  others),  the  10  ANOVAS  (one  per  dimension) 
for  the  good  performance  incidents  provided  only  one  significant  effect.1 
The  12  good  performance  incidents  on  the  reliability  dimension  exhibited 
a  level  by  items  interaction,  F  (11,69)  =  3.22,  p  <  .001. 

Examination  of  the  item  means  revealed  that  for  six  items,  supervisors 
rated  the  incident  higher  than  patrol  officers,  while  the  opposite 
pattern  occurred  for  the  other  six  items. 

For  the  pocr  performance  incidents  a  clear  trend  emerged.  Results 
of  the  analyses  appear  in  Table  1 ,2  The  level  effect  was  significant  for 
reliability  and  compatibility,  and  marginally  significant  for  dealing 
with  the  public,  work  attitude,  and  unassigned.  Also,  there  was  a 
marginal  level  by  item  interaction  for  job  knowledge.  Disagreement  was 
clearly  greater  for  below  average  performance  incidents.  Examination  of 
item  means  on  these  dimensions  showed  that  close  to  100?  of  the  time, 
supervisors  judged  the  performance  incidents  more  stringently  than  the 
patrol  officers.  The  interaction  for  Job  knowledge  was  caused  by  one 
item  where  patrol  officers  judged  the  incident  much  harsher  than  their 
supervisors. 

In  summary,  the  results  demonstrated  that  in  terms  of  dimensions, 
disagreement  between  organizational  levels  was  greatest  for  reliability. 
More  importantly  though,  the  findings  showed  a  clear  trend  for 
supervisors  to  judge  poor  performance  incidents  more  stringently  than 


their  subordinates. 
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Discriminant  Analyses 

Next,  10  discriminant  analyses  (one  per  dimension)  were  run  using 
individual  items  as  predictors  of  organizational  level.  A  stepwise 
method  was  used  to  select  items  into  the  equation  based  on  their 
discriminating  power.  The  criterion  for  entry  was  Kahalonobis  distance 
which  seeks  to  maximize  the  distance  between  groups  (Cooley  &  Lohnes, 
1971).  The  high  agreement  across  levels  for  the  dimensions  of  judgment, 
use  of  equipment,  and  communications  resulted  in  nonsignificant 
discriminant  functions.  The  results  for  the  remaining  seven  dimensions 
are  summarized  in  Table  2.  While  each  of  these  dimensions  provided 
significant  discriminant  functions,  only  reliability  and  the  unassigned 
dimensions  could  accurately  predict  supervisors  more  than  50>  of  the 
time.  Due  to  the  relatively  small  number  of  available  subjects,  it  was 
not  possible  to  cross- validate  the  discriminant  functions.  However,  the 
consistency  of  the  results  across  all  analyses  suggests  that  the 
differences  between  organizational  levels  were  not  large,  but  they  were 
meaningful . 

Discussion 

Results  of  the  analyses  were  reviewed  with  four  supervisors  and  one 
patrol  officer.  They  felt  that  the  findings  accurately  reflected 
differences  between  organizational  levels.  According  to  these  officers, 
dimensions  where  agreement  was  high  (e.g.,  judgment,  use  of  equipment, 
communication)  tended  to  be  the  performance  dimensions  heavily  emphasized 
in  the  police  academy  training  program.  They  also  felt  that  dimensions 
where  disagreement  was  high  (e.g.,  reliability,  unassigned, 
compatibility,  work  attitude)  reflected  two  phenomena:  (a)  these 
performance  dimensions  (and  behaviors  for  the  unassigned  dimension)  were 
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not  emphasized  ir.  forte]  trairing  programs,  and  more  faport*nt]>  '  t ,  they 
were  performance  ditensions  where  poor  performance  or.  the  part  of  patre. 
officers  would  be  salient  to  higher  levels  of  the  command  obair.  and  wou.e 
probably  cause  negative  perceptions  about  the  supervisor’s  capability  to 
lead  their  men/ women. 

The  clearest  example  of  this  latter  explanation  was  the  reliability 
dimension.  Examination  of  those  behaviors  where  itac  sean.»  were  coat 
discrepant  demonstrated  a  clear  pattern.  Patrol  officers  valued  proper 
attendance  behaviors  (e.g.,  coming  to  work,  being  punctual,  and  proper 
work  breaks)  more  positively  than  their  supervisors.  Supervisors  saw 
them  as  more  average,  expected  behaviors.  Also,  supervisors  viewed 
improper  attendance  behaviors  (excessive  absenteeism,  tardiness,  etc.) 
more  negatively  than  patrol  officers. 

At  a  more  general  level,  the  findings  of  this  study  hold  many 
implications  for  performance  appraisal  research.  First,  our  modified 
behavioral  anchored  scaling  procedure  appears  to  be  a  reasonable  vehicle 
for  generating  a  performance  schema.  However,  it  is  not  the  only  method 
(c.f.,  Borman,  1983;  Lord,  Foti,  A  DeVader,  1 984 ) .  A  key  advantage  of 
our  method  is  the  identification  of  meaningful  performance  behaviors 
where  there  is  maximal  disagreement  (i.e.,  the  unassigned  dimension).  As 
Nathan  and  Alexander  (1965)  suggested,  the  items  retained  on  a 
traditional  behavioral ly  anobored  rating  scale  are  probably  the  least 
informative  due  to  the  level  of  agreement  necessary  to  be  retained. 

The  current  study  also  provides  insight  into  where  performance 
schema  differences  between  supervisors  and  their  workers  are  likely  to 
occur.  Future  Investigations  should  focus  on  two  areas.  First, 
ambiguous  performance  dimensions  would  be  a  good  starting  point.  By 
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ambiguous  we  mean  those  dimensions  in  which  content  is  naturally  fuzzy 
(for  example,  work  attitude)  and/or  those  dimensions  that  are  not 
emphasized  during  employee  training  and  orientation.  Second,  further 
assessment  is  needed  of  the  notion  that  disagreement  is  more  likely  to 
occur  in  relation  to  below  average  performance. 

In  conclusion,  we  propose  that  our  rater  and  ratee  variation  of 
frame-of-reference  training  has  potential  utility  in  organizational 
settings.  The  current  exploratory  study  has  demonstrated  a  feasible 
method  of  meeting  the  prerequisites  for  this  training  strategy,  namely, 
identifying  detailed  performance  schemata  and  suggesting  where 
differences  between  organizational  levels  occur.  Future  research  is 
needed  to  assess,  in  organizational  settings,  the  benefits  of  frame-of- 
reference  training  in  terms  of  improved  supervisor  ratings,  improved 
ratee  performance,  and  improved  satisfaction  with  the  performance 
appraisal  process  for  both  supervisors  and  ratees. 
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1  Item  means  and  standard  deviations  are  available  from  the  second 

author. 
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For  parsimony,  The  item  main  effects  were  not  reported  in  Table  1 
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Table  1 

Repeated  Measures  Analyses  of  Variance 
for  Poor  Performance  Incidents 

Level  Level  X  Item 

Dimension:  Number  of  Items  £  £  approx. 


Job  Knowledge 

5 

2.43 

2.26* 

Judgment 

4 

1.59 

.10 

Use  of  Equipment 

3 

.17 

1.29 

Dealing  With  Public 

5 

3.55* 

1.74 

Reliability 

14 

3.95** 

.98 

Demeanor 

4 

2.64 

1.87 

Compatibility 

12 

5.86** 

1.54 

Communication 

8 

2.52 

.71 

Work  Attitude 

4 

3.20* 

1.28 

Unassigned 

13 

3.18* 

1.23 

Note.  U  «  82. 

*  p  <  .08 

**  p  <  .05 


Summary  of  Discriminant  Analyses 


&£.  -  82. 

^degrees  of  freedom  also  represent  the  number  of  items  retained  in  the  predictor  equation. 
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